Growing Multi-Domain Glossaries from a Few Seeds using Probabilistic Topic Models
نویسندگان
چکیده
In this paper we present a minimallysupervised approach to the multi-domain acquisition of wide-coverage glossaries. We start from a small number of hypernymy relation seeds and bootstrap glossaries from the Web for dozens of domains using Probabilistic Topic Models. Our experiments show that we are able to extract high-precision glossaries comprising thousands of terms and definitions.
منابع مشابه
GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
We present GlossBoot, an effective minimally-supervised approach to acquiring wide-coverage domain glossaries for many languages. For each language of interest, given a small number of hypernymy relation seeds concerning a target domain, we bootstrap a glossary from the Web for that domain by means of iteratively acquired term/gloss extraction patterns. Our experiments show high performance in ...
متن کاملMulti-document Summarization using Probabilistic Topic-based Network Models
Multi-document summarization has obtained much attention in the research domain of text summarization. In the past, probabilistic topic models and network models have been leveraged to generate summaries. However, previous studies do not investigate different combinations of various topic models and network models. This paper describes an integrated approach considering both probabilistic topic...
متن کاملFrom Glossaries to Ontologies: Disaster Management Domain
Our society’s reliance on a variety of critical infrastructures (CI) presents significant challenges for disaster preparedness, response and recovery. Experts from different domains including police, paramedics, firefighters and various other CI teams are involved in the fast paced response to a disaster, increasing the risk of miscommunication. To ensure clear communication, as well as to faci...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملSustainable Supply Chain Network Design: A Review on Quantitative Models Using Content Analysis
The purpose of this paper is to develop a systematic literature review on the subject of sustainable supply chain network design during 1990-2016, through a review of 261 papers. In this study, qualitative technique for conducting a systematic literature review was used. To systematize and make the literature review more accurate, content analysis method was used that include data collect...
متن کامل